Skip to main content
Version: 25.06

Classification Rules Reference

Overview

Classification Rules are the core components that contain the detection logic for identifying sensitive data patterns. Each rule combines policy elements, matching elements, and confidence thresholds to create sophisticated content detection capabilities.

Rule Structure

Basic Rule Structure

<Entity id="rule-identifier" patternsProximity="300" recommendedConfidence="75">
<Pattern confidenceLevel="85">
<IdMatch idRef="pattern-reference"/>
<Match idRef="supporting-evidence"/>
</Pattern>
<Pattern confidenceLevel="65">
<IdMatch idRef="alternative-pattern"/>
<Any minMatches="2">
<Match idRef="keyword-group-1"/>
<Match idRef="keyword-group-2"/>
<Match idRef="keyword-group-3"/>
</Any>
</Pattern>
</Entity>

Rule Types

Entity Rules

Entity rules detect specific data types with high confidence patterns.

AttributeTypeDescriptionRequiredDefaultRange
idStringUnique rule identifierYes-Must be unique within rule pack
patternsProximityIntegerMaximum distance between patternsNo3001-1000 characters
recommendedConfidenceIntegerRecommended confidence thresholdNo751-100

Example:

<Entity id="credit-card-number" patternsProximity="300" recommendedConfidence="85">
<!-- Pattern definitions -->
</Entity>

Evidence Rules

Evidence rules look for supporting context that increases confidence in detections.

<Evidence id="financial-context" patternsProximity="150">
<Pattern confidenceLevel="40">
<IdMatch idRef="financial-keywords"/>
</Pattern>
</Evidence>

Proximity Rules

Proximity rules check for related content within a specified distance.

<Proximity id="payment-context" patternsProximity="200">
<Pattern confidenceLevel="30">
<IdMatch idRef="payment-terms"/>
<Match idRef="amount-patterns"/>
</Pattern>
</Proximity>

Affinity Rules

Affinity rules detect relationships between different data elements.

<Affinity id="personal-financial" patternsProximity="500">
<Pattern confidenceLevel="60">
<IdMatch idRef="ssn-pattern"/>
<Match idRef="credit-card-pattern"/>
</Pattern>
</Affinity>

Similarity Rules

Similarity rules find content that matches known sensitive patterns.

<Similarity id="document-similarity" recommendedConfidence="70">
<Pattern confidenceLevel="80">
<IdMatch idRef="document-fingerprint"/>
</Pattern>
</Similarity>

Pattern Elements

Pattern Structure

Patterns define the specific conditions that must be met for a rule to match.

AttributeTypeDescriptionRequiredRange
confidenceLevelIntegerConfidence level for this patternYes1-100

IdMatch Element

The IdMatch element specifies the primary pattern that must be found.

AttributeTypeDescriptionRequiredExample
idRefStringReference to a pattern or keyword resourceYes"credit-card-regex"

Example:

<IdMatch idRef="ssn-pattern"/>

Match Element

Match elements specify supporting evidence or additional patterns.

AttributeTypeDescriptionRequiredExample
idRefStringReference to a pattern or keyword resourceYes"financial-keywords"

Example:

<Match idRef="payment-keywords"/>

Any Element

The Any element allows matching any of several patterns with minimum match requirements.

AttributeTypeDescriptionRequiredDefaultRange
minMatchesIntegerMinimum number of child patterns that must matchNo11 to number of children

Example:

<Any minMatches="2">
<Match idRef="keyword-group-1"/>
<Match idRef="keyword-group-2"/>
<Match idRef="keyword-group-3"/>
</Any>

Confidence Levels

Confidence Calculation

Confidence levels determine how certain the system is about a detection.

LevelRangeDescriptionUse Case
Low1-40Weak indicatorsSupporting evidence
Medium41-70Moderate confidenceContextual matches
High71-85Strong confidencePrimary patterns
Very High86-100Definitive matchesValidated patterns

Pattern Confidence

Each pattern within a rule has its own confidence level that contributes to the overall match confidence.

<Pattern confidenceLevel="85">
<IdMatch idRef="validated-ssn-pattern"/>
<Match idRef="ssn-keywords"/>
</Pattern>
<Pattern confidenceLevel="65">
<IdMatch idRef="possible-ssn-pattern"/>
<Any minMatches="2">
<Match idRef="personal-keywords"/>
<Match idRef="document-keywords"/>
<Match idRef="form-keywords"/>
</Any>
</Pattern>

Matching Elements Reference

Built-in Pattern Types

Regular Expression Patterns

<Regex id="ssn-pattern">
<Pattern>\b\d{3}-?\d{2}-?\d{4}\b</Pattern>
</Regex>

Keyword Groups

<Keyword id="financial-terms">
<Group matchStyle="word">
<Term>account</Term>
<Term>balance</Term>
<Term>payment</Term>
</Group>
</Keyword>

Built-in Functions

FunctionDescriptionValidation
Func_credit_card_formattedCredit card with formattingLuhn algorithm
Func_credit_card_unformattedCredit card without formattingLuhn algorithm
Func_ssn_formattedSSN with dashesFormat validation
Func_ssn_unformattedSSN without formattingFormat validation

Example:

<IdMatch idRef="Func_credit_card_formatted"/>

Advanced Pattern Matching

Proximity Matching

Patterns can specify how close different elements must be to each other.

<Entity id="credit-card-with-context" patternsProximity="300">
<Pattern confidenceLevel="90">
<IdMatch idRef="credit-card-pattern"/>
<Match idRef="credit-card-keywords"/>
</Pattern>
</Entity>

Conditional Logic

Complex conditions can be built using logical operators.

<Pattern confidenceLevel="75">
<IdMatch idRef="account-number"/>
<Any minMatches="1">
<Match idRef="bank-keywords"/>
<Match idRef="routing-keywords"/>
</Any>
</Pattern>

Exclusion Patterns

Patterns can exclude certain matches to reduce false positives.

<Pattern confidenceLevel="80">
<IdMatch idRef="ssn-pattern"/>
<Match idRef="personal-context"/>
<Not>
<Match idRef="test-data-keywords"/>
</Not>
</Pattern>

Rule Examples

Credit Card Detection Rule

<Entity id="credit-card-number" patternsProximity="300" recommendedConfidence="85">
<Pattern confidenceLevel="95">
<IdMatch idRef="Func_credit_card_formatted"/>
<Any minMatches="1">
<Match idRef="credit-card-keywords"/>
<Match idRef="payment-keywords"/>
</Any>
</Pattern>
<Pattern confidenceLevel="85">
<IdMatch idRef="Func_credit_card_unformatted"/>
<Any minMatches="2">
<Match idRef="credit-card-keywords"/>
<Match idRef="payment-keywords"/>
<Match idRef="financial-keywords"/>
</Any>
</Pattern>
<Pattern confidenceLevel="70">
<IdMatch idRef="credit-card-regex"/>
<Any minMatches="3">
<Match idRef="visa-keywords"/>
<Match idRef="mastercard-keywords"/>
<Match idRef="amex-keywords"/>
<Match idRef="payment-context"/>
</Any>
</Pattern>
</Entity>

Social Security Number Rule

<Entity id="ssn-detection" patternsProximity="200" recommendedConfidence="80">
<Pattern confidenceLevel="90">
<IdMatch idRef="Func_ssn_formatted"/>
<Match idRef="ssn-keywords"/>
</Pattern>
<Pattern confidenceLevel="75">
<IdMatch idRef="Func_ssn_unformatted"/>
<Any minMatches="2">
<Match idRef="ssn-keywords"/>
<Match idRef="personal-keywords"/>
<Match idRef="government-keywords"/>
</Any>
</Pattern>
</Entity>

Bank Account Number Rule

<Entity id="bank-account-number" patternsProximity="250" recommendedConfidence="75">
<Pattern confidenceLevel="85">
<IdMatch idRef="bank-account-regex"/>
<Any minMatches="1">
<Match idRef="routing-keywords"/>
<Match idRef="bank-keywords"/>
</Any>
</Pattern>
<Pattern confidenceLevel="70">
<IdMatch idRef="account-number-pattern"/>
<Any minMatches="2">
<Match idRef="banking-keywords"/>
<Match idRef="financial-keywords"/>
<Match idRef="account-keywords"/>
</Any>
</Pattern>
</Entity>

Performance Optimization

Pattern Ordering

Order patterns by confidence level (highest first) for optimal performance.

<Entity id="optimized-rule">
<!-- Highest confidence pattern first -->
<Pattern confidenceLevel="95">
<IdMatch idRef="high-confidence-pattern"/>
</Pattern>
<!-- Lower confidence patterns follow -->
<Pattern confidenceLevel="75">
<IdMatch idRef="medium-confidence-pattern"/>
<Match idRef="supporting-evidence"/>
</Pattern>
</Entity>

Proximity Settings

Use appropriate proximity values based on content type:

Content TypeRecommended ProximityReason
Structured Forms100-200 charactersFields are close together
Documents300-500 charactersContext may be spread out
Email Content200-400 charactersMixed structured/unstructured
Database Exports50-150 charactersHighly structured data

Keyword Optimization

  • Use specific keywords over generic terms
  • Group related keywords together
  • Limit keyword lists to essential terms
  • Use word matching for better precision

Validation and Testing

Rule Validation

  1. Syntax Validation: Ensure XML is well-formed
  2. Reference Validation: Verify all idRef attributes point to valid resources
  3. Logic Validation: Check pattern logic makes sense
  4. Performance Testing: Test with representative content

Testing Methodology

  1. Positive Testing: Verify rules match intended content
  2. Negative Testing: Ensure rules don't match unintended content
  3. Boundary Testing: Test edge cases and limits
  4. Performance Testing: Measure processing time and resource usage

Best Practices

Rule Design

  1. Start Simple: Begin with basic patterns, add complexity gradually
  2. Use Multiple Patterns: Provide different confidence levels
  3. Include Context: Use supporting keywords for better accuracy
  4. Test Thoroughly: Validate with real-world content samples

Performance

  1. Optimize Proximity: Use smallest effective proximity values
  2. Order Patterns: Place highest confidence patterns first
  3. Limit Complexity: Avoid overly complex logical conditions
  4. Monitor Performance: Track rule execution times

Maintenance

  1. Version Control: Track rule changes over time
  2. Regular Review: Periodically assess rule effectiveness
  3. Update Keywords: Keep keyword lists current
  4. Performance Monitoring: Watch for degradation over time